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Real-time high-resolution three-dimensional (3D) reconstruction of scenes hidden from the direct field 
of view is a challenging field of research, with applications in real-life situations related, e.g., to surveil- 
lance, self-driving cars, and rescue missions. Most current techniques recover the 3D structure of a 
non-line-of-sight (NLOS) static scene by detecting the return signal from the hidden object on a scat- 
tering observation area. Here, we demonstrate the full color retrieval of the 3D shape of a hidden scene by 
coupling back-projection imaging algorithms with the high-resolution time-of-flight information provided 
by a single-pixel camera. By using a high-efficiency single-photon avalanche-diode (SPAD) detector, this 
technique provides the advantage of imaging with no mechanical scanning parts, with acquisition times 


down to the subsecond range. 
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The identification of scenes hidden from the direct line 
of sight, as happens for objects hidden behind an occluder 
or wall, is a challenging imaging task, with applications 
in defence, surveillance, and self-driving vehicles [1]. 
Non-line-of-sight (NLOS) imaging has been demonstrated 
by using radar systems [2], wave-front shaping [3], and 
speckle correlation [1,4,5] and recently even with passive 
cameras capturing light originating from behind a wall, 
using an ordinary digital camera [6]. Most approaches 
in this field have demonstrated how to identify the hid- 
den scene by collecting the light scattered back by hidden 
objects with a system similar to light detection and rang- 
ing (LIDAR) by using the time-of-flight information of 
the back-scattered signal [7—11]. This technique typically 
involves a pulsed laser beam pointed on a scattering sur- 
face, producing a spherical wave propagating into the 
hidden scene. When the spherical wave hits the hidden 
object, the light is then scattered back toward the scattering 
surface. Collection of the third-bounce echo scattered from 
the hidden object allows the detection and identification 
of the hidden scene by advanced three-dimensional (3D) 
reconstruction algorithms [12]. Past results have demon- 
strated how this technique can be used for tracking a 
moving hidden object even over large distances [13,14] 
and for the retrieval of the 3D shape of a static hid- 
den object by using back-projection imaging algorithms 
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[12,15] or ellipsoid-mode decomposition for multiple 
hidden objects [16]. Alternative methods aimed at 
simplifying or increasing the speed of LIDAR-like NLOS 
imaging rely on two-dimensional (2D) continuous illumi- 
nation [17], deep learning [18], and confocal illumination 
and/or collection [19]. The multiple-bounce back-scattered 
signal is typically very weak, so these techniques require 
high temporal resolution and high-speed single-photon 
cameras with a high detection efficiency. 

However, one of the main limitations of current NLOS 
imaging systems is the finite temporal resolution of the 
detectors, which in turn determines the spatial resolution of 
the retrieval and thus the ability to reconstruct satisfacto- 
rily the 3D structure of the hidden scene. Furthermore, the 
complete 3D retrieval of the hidden scene often requires 
prohibitive time resources with the 3D imaging of a mov- 
ing object, although progress is being made to reduce the 
acquisition and reconstruction times [6,19], with the goal 
of reaching times in the second or even subsecond range. 
Additionally, the spatial resolution of the retrieval can 
be improved by using iterative back-projection algorithms 
that, however, typically require longer computational times 
[20,21]. 

Another rapidly evolving imaging technique is based on 
so-called single-pixel cameras [22]. Standard 2D single- 
pixel imaging systems recover images by projecting an 
array of patterns onto the scene and detecting only the 
total reflected- or transmitted-light intensity, for which a 
single pixel is sufficient. The computational image recon- 
struction can then be achieved by computing a weighted 
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sum of all the illumination patterns, where the weight 
of each pattern is given by the corresponding measured 
intensity level. This technique can also be extended for 
use in, e.g., microscopy [23], imaging through scattering 
media [24], and terahertz imaging [25] or full 3D LIDAR 
[26-28]. 

Although this technique requires many consecutive 
measurements, the consequent long acquisition time 
required for high-spatial-resolution images can be sig- 
nificantly reduced by applying compressive sensing 
[25,29,30]. More importantly, the technique provides more 
flexibility in choosing the optimal (single-pixel) detector 
for the imaging challenge being addressed. Of relevance 
to this work, this implies that we can choose a single- 
pixel detector with an enhanced temporal response and 
build upon the typically better temporal resolutions that are 
available in single-pixel format (timing resolutions down 
to picoseconds), when compared to technologies involv- 
ing camera and/or single-photon avalanche-diode (SPAD) 
arrays. 

In this work, we demonstrate full 3D retrieval of hidden 
scenes by using a time-resolved single-pixel camera. The 
choice of a single-pixel-camera approach allows us to use 
optimized (sub-30-ps impulse-response function) single- 
photon detectors in combination with a digital-mirror 
device (DMD) so as to remove the need for any scan- 
ning components while building upon the 20-kHz refresh 
rate of the DMD and the high single-photon sensitivity to 
reduce acquisition times with good reconstruction fidelity. 
By employing a white-light laser, we extend the technique 
to achieve the full red-green-blue (RGB) color retrieval 
of noncooperative hidden objects and by choosing high- 
efficiency SPADs we also achieve subsecond acquisition 
times. 

Figure l(a) shows the experimental setup: the single- 
pixel camera is composed of a camera-lens objective 
(8 mm focal length, f/3.5) that images a 50 x 50 cm? 
portion of a scattering wall onto a DMD (placed 1.16 
m from the wall). The DMD then projects only selected 
portions (masks) of the image onto a single-pixel single- 
photon detector through a microscope objective. The 
single-photon detector therefore acts as a bucket detec- 
tor and has a 30-ps impulse-response time and an area 
of 57 x 57 um’. The DMD masks have 20 x 20 pix- 
els, corresponding to 2.6 x 2.6 cm? pixel areas on the 
scattering wall. The single-photon data are recorded 
in time-correlated-single-photon-counting (TCSPC) mode 
triggered by the illumination laser, as a histogram of pho- 
ton arrival times, with 4096 time bins of 6.1-ps duration 
each. The laser is directed on the scattering wall 10 cm 
to the right of the field of view of the single-pixel cam- 
era, producing a first scattered spherical wave. Part of the 
spherical wave hits the hidden object, which in turn scat- 
ters back into the field of view, where it is captured by 
the time-resolved single-pixel camera. The intensity at a 
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FIG. 1. The schematics of the experimental setup for NLOS 
imaging. (a) A pulsed laser scatters on a scattering surface, 
producing a spherical wave in all the surrounding area. The 
DMD is imaging a 50 x 50 cm? area of the scattering surface 
and the time-resolved single-pixel camera collects the signal 
back-scattered from the target in the hidden scene by temporal 
histograms. We collect the signal scattered back by the hidden 
target by sequentially collecting light from each of the 20 x 20 
pixels either by raster-scan masks or by Hadamard masks. (b) 
The noncooperative single-object scene used for ultrafast NLOS 
imaging. 


given pixel (x’,y’,z’ = 0) on the observation area at an 
arrival time ¢ is given by the voxels of the object that could 
contribute to the signal, described as follows: 


Ip5(te — rey — r, 
la'y'? = 0,9 = f poet ED) ni, (1) 


x92 Tay 
where Jọ is the initial intensity of the spot on the wall and 
the delta function describes the propagation of the light 
from the laser spot to the target and back to the observed 
pixel. Here, fc is the distance covered by light traveling at 
speed c in time t, re (x,y,z) = yx? + y? +2? is the dis- 
tance between the object voxels and the laser spot, and 
Top (X,Y,Z, X'y’, Z = 0) =/& -= x) + y -y +22 is 
the distance between the portion of the object and the 
observed pixel. The ô term encodes the object shape 
through the time-of-flight signal created in reflection from 
each coordinate point of the object and the 1/7? terms 
encode the intensity decay with distance due to diffusive 
reflection from the wall and object. 

We then explore various imaging scenarios, testing dif- 
ferent objects and DMD mask choices. For the first sce- 
nario, we investigate a hidden scene of two cooperative 
objects, i.e., a highly reflective tin-foil 7.62-cm-diameter 
cylinder and a 2.54-cm-diameter mirror, placed at different 
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FIG. 2. A cooperative two-object scenario for a NLOS single- 
pixel camera based on a PMT detector. (a) The return signal 
produced by the two hidden objects and retrieved by raster-scan 
imaging onto the DMD and collection of the signal on the single- 
pixel detector in the time frame t= 7.1 ns. (c),(d) 3D shape 
retrieval of the hidden scene in (c) the x-y plane and (d) the x- 
z plane after applying a back-projection imaging algorithm. The 
dotted line represents the actual position of the targets. To facil- 
itate the visualization, the reflectivity is normalized to the local 
maxima value. The objects to be recovered are two round targets 
of 2.54 cm and 7.62 cm diameter at a varying depth and tilted. 
The voxel dimension is 0.2 x 0.2 x 1 cm? for an area to investi- 
gate of 20 x 20 x 100 cm?. The threshold used for the retrieval 
is 0.89. 


positions and distances from the wall [see Fig. 2(b)]. We 
use a 120-fs pulsed laser at 808 nm wavelength with a rep- 
etition rate of 80 MHz and an average power of 800 mW. 
We collect the third-bounce echo with a simple raster- 
scan acquisition on the DMD, one 2.6 x 2.6 cm? imaging 
pixel at a time, and collect the reflected light onto a photo- 
multiplier tube (PMT, hybrid photodetector HPM-100-07, 
Becker & Hickl, 4% efficiency at 808 nm) with optimized 
temporal response (the measured total impulse-response 
time is 27 ps FWHM). The acquisition time is set by the 
amount of the back-scattered signal detected by the sen- 
sor. In this case, the acquisition time is 10 s per mask (i.e., 
pixel) for a total acquisition time of 66 min. Figure 2(a) 
shows one time frame of the collected third-bounce echoes 
of the two hidden objects. 

We proceed with the retrieval of the 3D shape of 
the hidden objects by applying the back-projection imag- 
ing algorithm first introduced by Velten et al. [12,20]. 
Although faster retrieval methods are available (see, e.g., 
Ref. [15]), our primary focus here is on the hardware rather 
than on the retrieval software. Regarding the time-of-flight 
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evaluation, we consider the moment when the laser hits 
the wall as the zero time reference. We therefore divide 
the 3D space into 10° voxels and we calculate the likeli- 
hood of the target being localized on each voxel by using 
the time of flight ¢, that the light takes to cover the dis- 
tance rey + yp. The relation ct, = rey + Fop indicates that 
all the possible contributions to a given pixel lie on the sur- 
face of an ellipsoid the foci of which are the laser spot and 
the pixel position. The ensemble of the 400 temporal his- 
tograms overlaps at the scattering object position and thus 
encodes the 3D geometry of the hidden objects. We assign 
the likelihood of a voxel by summing the intensity of the 
pixels that could have received any contribution from the 
voxel. Following Ref. [12], the final 3D shape of the object 
is improved by applying a Laplacian filter followed by a 
threshold selection on the data along the 2 direction of the 
voxel grid. This thresholding is applied to remove artifacts, 
e.g., blurring of the reconstructed object due to the limited 
projection angles of the imaging system and that make a 
relatively small contribution to the voxel probability dis- 
tribution [16,31]. It is important to mention that artifacts 
can appear in the 3D reconstructed scene, mostly due to 
the ill-posedness of the inverse problem induced by the 
geometry of the sensing strategy (the limited field of view). 
Such artifacts can include blurred 3D structured and spuri- 
ous objects not actually present in the scene. A high timing 
resolution (as used in this work) definitively improves 
the reconstruction performance and the proposed thresh- 
olding approach also seems robust to artifacts. However, 
more complex and robust computational methods might 
be needed for more challenging NLOS imaging scenarios, 
e.g., more complex objects, further away from the wall. 

Figures 2(c) and 2(d) show the results on the x-y plane 
and on the x-z plane, respectively. The dotted line in 
each figure indicates the actual positions of the targets. 
Our results show that this technique provides an accu- 
rate 3D shape recovery of the hidden (cooperative) targets, 
although with relatively long acquisition times. 

We then investigate a hidden scene of noncooperative 
objects with the same setup (Fig. 3). In this case, the scene 
to be recovered is a RGB-colored object, placed outside the 
direct line of sight [Fig. 3(b)], where each colored region 
has a rectangular shape of 20 x 9 cm”. In order to retrieve 
the color information, we use a supercontinuum laser 
(SuperkK EXTREME/FIANIUM, NKT Photonics, repeti- 
tion rate 67 MHz, pulse duration approximately 10 ps, 
average power 100 mW in the 450-700 nm spectral range). 
For the RGB retrieval, we run a separate measurement for 
each of the three RGB colors, using corresponding band- 
pass spectral filters centred at of 490, 550, and 610 nm 
(40-nm bandwidth) after the laser source, with roughly 20 
mW average power for each color. As above, due to the 
relatively low laser power, the optimal acquisition time is 
found to be 10 s per mask for an overall acquisition time of 
66 min. Figure 3(a) shows the three return signals scattered 
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FIG. 3. A noncooperative objects RGB-colored scenario for a 


NLOS single-pixel camera based on a PMT detector. The red 
color corresponds to the recovered target by using the red spectral 
filter and so on. (a) The return signal produced by the hidden 
object by raster-scan acquisition. (b) The RGB-colored target. 
(c),(d) RGB retrieval of the 3D shape of the hidden objects in 
(c) the x-y plane and (d) the x-z plane after applying a back- 
projection imaging algorithm. The red color corresponds to the 
recovered target by using the red spectral filter and so on. The 
dotted black line represents the actual position of the target. We 
discretize the space in voxels of 1.4 x 1.4 x 1 cm? for an overall 
area of 140 x 140 x 100 cm’. 


back by the three RGB-colored targets at a time frame of 
8 ns. Figures 3(c) and 3(d) show the reflectivity on the x-y 
plane and on the x-z plane, respectively: as can be seen, the 
retrieved 3D scene corresponds very closely to the ground 
truth (dashed lines). 

The last scenario that we investigate is a noncoopera- 
tive hidden object [a white paper rectangle of 24 x 10 cm’; 
see Fig. 1(b)] aimed at the optimizing acquisition time. We 
achieve high-speed acquisition by using a high-efficiency 
(70% peak efficiency at 550 nm) SPAD detector [32] with 
a measured impulse-response function of 30 ps FWHM. 
The SPAD has a square active area of 57 x 57 um? and 
we use a 75-mm-focal-length lens and a long-working- 
distance objective (magnification factor 50) to focus the 
light after the DMD onto the sensor. The high sensitivity 
of the detector allows a shorter acquisition time of 1 ms 
per mask, which in this case where chosen as the first 400 
Hadamard patterns, with the goal of increasing the amount 
of collected light for each mask (50% of the pixels are 
always projected onto the detector for each mask). For each 
Hadamard pattern, one binary mask and its negative are 
used and combined, leading to a total of 800 patterns. This 
allows the total acquisition time to be reduced to only 0.8 s. 
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In this case, we use the same supercontinuum laser as in 
the previous scenario, with a power of 550 mW at 550 nm. 
As shown in Fig. 4(a), the collected third-bounce echo of 
the signal is affected by a low signal-to-noise ratio due 
to the short acquisition time. We therefore first apply a 
denoising algorithm similar to that used in Ref. [33], thus 
returning the signal shown in Fig. 4(b). To be precise, a 
cost function is defined, accounting for the forward model 
(including observation noise that is assumed to be Poisson 
distributed) relating the image sequence (or video) reach- 
ing the DMD and the set of temporal sequences recorded 
for each Hadamard pattern. The cost function also includes 
two penalty terms to promote temporal and spatial smooth- 
ness after denoising. This is enforced by using a spatial 
total-variation (TV) regularization, as well as a low-pass 
constraint on the Fourier transform of the temporal inten- 
sity profile of each pixel. This cost function is convex and 
the denoising step, i.e., the cost-function minimization, is 
performed using an alternating direction method of multi- 
pliers (ADMM) algorithm [34], as in Refs. [35] and [36]. 
Figures 4(c) and 4(d) show the retrieved reflectivity of 
the hidden objects on the x-y plane and on the x-z plane, 
respectively. Our results therefore show that this technique 
provides an accurate 3D shape of a hidden target even with 
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FIG. 4. The ultrafast NLOS single-pixel camera results. (a) The 
return signal produced by the hidden object by Hadamard acqui- 
sition at a time frame of 5.3 ns. (b) The return signal obtained 
by the preprocessing filter. (c),(d) Retrieval of the 3D shape 
of the hidden target in (c) the x-y plane and (d) the x-z plane 
after applying a back-projection imaging algorithm. The dot- 
ted black line represents the actual position of the target. The 
object to be recovered is a rectangular target of 24 x 10 cm?. 
The voxel dimension is 1.4 x 1.4 x 1 cm? for an overall of 
140 x 140 x 100 cm?. The threshold used for the retrieval is 
0.80. 


011002-4 


NON-LINE-OF-SIGHT THREE-DIMENSIONAL. .. 


subsecond acquisition times, with an average number of 
only 1.2 photons per pixel in each time frame (with a max- 
imum peak photon number of approximately 10 photons 
per pixel). 

In conclusion, the high efficiency and the high tempo- 
ral resolution of single-pixel single-photon detectors allow 
us to accurately recover the 3D shape of hidden objects 
even with low-resolution masks of 20 x 20 pixels and no 
mechanical scanning parts. 

The main limitation on the spatial resolution of the 
retrieval is due to the pixel size on the scattering wall. In 
our case, a pixel size of 2.6 x 2.6 cm? limits the temporal 
resolution to 60 ps due to the blurring of the pulse wave 
front as it crosses the 3.6-cm pixel collection area (taking 
the diagonal of the square pixel). This effectively corre- 
sponds to an uncertainty in the arrival time of the return 
pulse and, in analogy with standard LIDAR, will trans- 
late into an uncertainty of the object depth location that 
is half this value, i.e., 1.8 cm. This can be overcome by 
decreasing the pixel size; however, this is achieved at the 
cost of longer acquisition times due to the larger number 
of Hadamard patterns (or pixels to scan). 

Overall, the ability to identify a hidden scene by the 
proposed approach is mainly determined by the time res- 
olution of the detector and by the time required to acquire 
a significant back-scattered signal. The retrieval of hid- 
den scenes still remains a challenging task, due to long 
acquisition times and low spatial resolution of the retrieval 
and computational sources, although significant steps have 
been made recently (see, e.g., Refs. [6] and [19]). We show 
a NLOS ultrafast imaging technology that can reliably 
recover the 3D shape in color of a scene with high spatial 
resolution by using single-pixel single-photon detectors 
with high temporal resolution. By using a high-sensitivity 
detector, this system is able to retrieve the shape of the 
hidden target with an acquisition time of 0.8 s, paving the 
way for real-time 3D shape recovery of hidden objects. The 
accurate 3D shape recovery of this system could be further 
improved by fully exploiting the benefits of using a single- 
pixel camera for the acquisition. Indeed, a future improve- 
ment in this method would be to decrease the acquisition 
times by using compressive sampling [37]. By combin- 
ing compressive sensing and constant improvements of 
the detection and computational resources, recovery of 
the 3D shape of hidden moving objects with high spatial 
resolution should be possible. 
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